-
Notifications
You must be signed in to change notification settings - Fork 352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some work participant discovery #2086
Open
eboasson
wants to merge
13
commits into
eclipse-cyclonedds:master
Choose a base branch
from
eboasson:rework-spdp
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This replaces the per-participant timed-event for periodically broadcasting its SPDP message by a central device that schedules and sends the SPDP messages for all participants and maintains the set of addresses to which to send them. There are a number of benefits to this model: * We can make it more likely (certain, if we want) that SPDP messages to a single destination go out in a single packet, reducing the packet rate if there are many participants. * We are no longer dependent on an "addrset" for tracking the set of addresses to publish to (and we don't need to special-case SPDP writers). That means we now have the option of "aging" locators and lowering the frequency or giving up on them altogether if there just doesn't seem to be anything at that address.
This eliminates looking up the SPDP sample in the WHC, which became rather painful with the introduction of serdata_plist, and eliminates the problem of the dispose+unregister not getting stored in the WHC (removed from the index, best-effort so not retained until ACKs come in). It also solves having to construct an address set based on the addresses in the SPDP administration and patching that into the SPDP writer, which was a horrible hack to begin with.
The old default of 9 (so 10 participants) was hit rather often. This raises the default to 99 (so 100 participants). The only real downside is that increases the number of SPDP packets 10x if unicast discovery is used.
Signed-off-by: Erik Boasson <[email protected]>
If one adds a host to the peer list, we add MaxAutoParticipantIndex+1 locators for it, but it generally doesn't make much sense to keep pinging all of them if there is no response. Especially if MaxAutoParticipantIndex is large, it causes a lot of periodic traffic.
SPDP locators can be added by the configuration but also by discovering a peer that we (presumably) cannot reach via multicast. The question is what to do with such a discovered address when the peer is no longer there. If we learnt a locator through discovery and we know for certain that the peer is no longer there (i.e., received a dispose/unregister) and that there are also no other peers at the locator, it makes sense to remove the address: a new peer that shows up at that locator would presumably try to contact us. If instead the peer's lease expired, we need to keep pinging it for a some time in case it was just the disconnect. So for each SPDP locator we need to know why we have it in the table of addreses. Signed-off-by: Erik Boasson <[email protected]>
This replaces the somewhat silly (and ancient) tracing of the queue length on every insert.
Signed-off-by: Erik Boasson <[email protected]>
Signed-off-by: Erik Boasson <[email protected]>
Signed-off-by: Erik Boasson <[email protected]>
Replaces src/core/xtests/spdp_scenarios.bash
eboasson
force-pushed
the
rework-spdp
branch
from
September 25, 2024 18:02
33eb9b6
to
31e30cb
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is somewhat massive, but it is primarily driven by a simple thing: make it possible to drop locators from SPDP when no participant seems to exist at that address. I've always refused to do that because of some edge cases in asymmetrical discovery and reconnecting after network interrupts, but the problem is that it also meant that
MaxAutoParticipantIndex
had to remain small to prevent massive bursts of periodic traffic, but that low limit caused enough trouble to make people prefer those bursts. The ROS 2 navigation stack suffers from this in particular: ros2/rmw_cyclonedds#458Thus, this makes it prune locators after some time, distinguishing between peers specified in the configuration and locators discovered at run-time, and handling locators of participants of which the lease expired differently from those that terminated cleanly.
It also flips the scheduling of SPDP messages around: it used to have a timed event for each local participant and send to all known locators in one burst; now it scans a table of locators and sends SPDP messages for all participants at the same time. The advantage should be packing; most people will never notice because most applications only have a single participant ...
It's been working just fine for months on my laptop, but there are many configurations out there and I have not been using all of them. I'm sure there are still issues with it, but I think the time has come to just merge it and see what breaks. (Perhaps nothing, after all!)
I've made a PR because I want to be done with it 🙂 but have marked it as a draft because I still need to turn the handful of test scenarios in a bash script into tests that run as part of the regular CI.